Goto

Collaborating Authors

 statistical mechanics theory


Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

Neural Information Processing Systems

Despite the remarkable empirical performance of Transformers, their theoretical understanding remains elusive. Here, we consider a deep multi-head self-attention network, that is closely related to Transformers yet analytically tractable. We develop a statistical mechanics theory of Bayesian learning in this model, deriving exact equations for the network's predictor statistics under the finite-width thermodynamic limit, i.e., N,P\rightarrow\infty, P/N \mathcal{O}(1), where N is the network width and P is the number of training examples. Our theory shows that the predictor statistics are expressed as a sum of independent kernels, each one pairing different "attention paths", defined as information pathways through different attention heads across layers. The kernels are weighted according to a "task-relevant kernel combination" mechanism that aligns the total kernel with the task labels.


Eight challenges in developing theory of intelligence

Huang, Haiping

arXiv.org Artificial Intelligence

A good theory of mathematical beauty is more practical than any current observation, as new predictions of physical reality can be verified self-consistently. This belief applies to the current status of understanding deep neural networks including large language models and even the biological intelligence. Toy models provide a metaphor of physical reality, allowing mathematically formulating that reality (i.e., the so-called theory), which can be updated as more conjectures are justified or refuted. One does not need to pack all details into a model, but rather, more abstract models are constructed, as complex systems like brains or deep networks have many sloppy dimensions but much less stiff dimensions that strongly impact macroscopic observables. This kind of bottom-up mechanistic modeling is still promising in the modern era of understanding the natural or artificial intelligence. Here, we shed light on eight challenges in developing theory of intelligence following this theoretical paradigm.